library(tidyverse)
library(base)
library(patchwork)
library(scales)
library(ggridges)
library(patchwork)
library(ggseqlogo)
library(dplyr)
library(table1)Group 26 - Project
The analysis explores the correlation between illnesses, Right Heart Catheterization (RHC) treatment, and the Acute Physiology Score (APS) concerning mortality. Various plots are generated to create an understanding of the dataset, aiming to reveal potential relationships among the mentioned attributes. These visual representations serve as a tool to discover whether specific factors are associated with death.
Load data
rhc_aug <- read_csv("../data/rhc_aug.csv")To get a quick overview of some the variables in the dataset we can plot some very simple plots. We start out by visualizing sex against the age, and color by the death column,
rhc_aug |>
ggplot(aes(x = age,
y = sex,
color = death)) +
geom_point()It is now seen here that there is no apparent pattern for death based on the sex against age plot above.
Another thing that we might want visualize is the mean blood pressure against each diagnosis. Before the data was agumented the code below was used to show patient without disease as well with the legend label “0”. Though, as the agumented code data was implemented the plot below show the same as the next plot.
This plot is the same as the code above with color mapping for each diagnosis. It show the mean blood pressure for each diagnosis.
rhc_aug |>
ggplot(aes(x = Diagnosis,
y = meanbp1,
fill = factor(Diagnosis))) +
geom_point(aes(color = factor(Diagnosis)),
position = position_dodge(0.8),
alpha = 0.5) +
geom_boxplot(color = "black",
alpha = 0.5) +
labs(title = "Mean blood pressure against all diagnosis",
y = "Mean blood pressure",
color = "Diagonosis") +
guides(fill = guide_legend(title = "Diagnosis Status"),
color = FALSE) +
theme(legend.key.height = unit(0.7, 'cm'),
legend.key.width = unit(0.7, 'cm'))Already from these two plots we see that underlying data points are split into two groups and we will take this into consideration in a bit.
This plot doesn’t make much sense with use of rhc_aug data. Therefore, we will skip this as continuing with this will lead to a dead end.
The plot below show the mean blood pressure for patiens across all diagnoses and whether they had RHC performed on them. So we generally see picture of patients with lower blood pressure getting RHC performed on them.
rhc_aug |>
ggplot(aes(x = Diagnosis,
y = meanbp1,
fill = factor(swang1))) +
geom_point(aes(color = as.factor(swang1)),
position = position_dodge(0.8),
alpha = 0.5) +
geom_boxplot(alpha = 0.5,
outlier.shape = NA) +
labs(title = "Mean blood pressure for all diagnoses and RHC status",
y = "Mean blood pressure",
color = "RHC status",
fill = "RHC status") +
scale_color_manual(values = c("1" = "#619CFF", "0" = "#F8766D"),
labels = c("1" = "Yes", "0" = "No"))In the plot below we take into account the split in data points we see underneath each violin plot. This is due to the data following a bimodal distribution which has to peaks rather than the normal distribution which has one peak. The reason why we have chosen the violin plot is to signify the bimodal distribution.
meanbp_plot <- rhc_aug |>
ggplot(aes(x = cat1,
y = meanbp1,
fill = factor(swang1))) +
geom_point(aes(color = cat1),
position = position_dodge(0.8),
alpha = 0.5) +
geom_boxplot(alpha = 0.35,
outlier.shape = NA) +
scale_y_continuous(breaks = c(0, 50, 100, 150, 200),
labels = c(0, 50, 100, 150, 200)) +
labs(title = "Mean Blood Preassue Against Primary Diseases",
x = "Primary Disease Category",
y = "Mean Blood Pressure",
color = "Primary Disease Category") +
guides(fill = guide_legend(title = "RHC")) +
theme(axis.text.x = element_text(angle = 30,
hjust = 1,
vjust = 1))
meanbp_plotggsave(filename = "06_meanbp_plot_3.png",
plot = meanbp_plot,
device = "png",
path = "../results",
scale = 1,
limitsize = TRUE)This plot shows all the diseases and how many people have it = 1 or dont have the disease = 0.
subHX <- rhc_aug |> select(ends_with("hx"))
subHX <- subHX |> pivot_longer(everything(),
names_to = "variables",
values_to = "values")
subHX |> mutate(values = factor(values)) |>
ggplot(aes(x = variables,
fill = values)) +
geom_bar(position = "dodge") +
theme(axis.text.x = element_text(angle = 20,
hjust = 1))We begin by exploring if the number of comorbidity diseases influence the risk of death.
Comorbidities_risk <- rhc_aug |> mutate(death = factor(x = death,
levels = c(0,1),
c("No","Yes"))) |>
ggplot(aes(x = comorbidities,
fill = factor(death))) +
geom_bar(position = "stack") +
labs(title = "Risk of Dying by Number of Comorbidities",
x = "Number of Comorbidities",
y = "Number of Patients") +
scale_fill_manual(name = "Death",
values = c("No" = "darkolivegreen2",
"Yes" = "red"))
ggsave(filename = "06_Comorbidities_risk_4.png",
plot = Comorbidities_risk,
device = "png",
path = "../results",
scale = 1,
limitsize = TRUE)Now we want to include RHC treatment to see if this, together with comorbidities will influence the death rate.
rhc_aug |>
mutate(swang1 = factor(x = swang1,
levels = c(0,1),
c("no RHC","RHC"))) |>
ggplot(aes(x = comorbidities,
fill = factor(death))) +
facet_wrap(~swang1)+
geom_bar(position = "stack") +
labs(title = "Risk of dying by number of comorbidities",
x = "Number of comorbidities",
y = "Number of patients") +
scale_fill_manual(name = "Survival",
values = c("0" = "#619CFF", "1" = "#F8766D"),
labels = c('Dead', 'Alive')) +
theme_minimal()This plot shows all individuals who survided and died with and without the RHC treatment.
rhc_aug |>
mutate(swang1 = factor(x = swang1,
levels = c(0,1),
c("no RHC","RHC")),
death = factor(x = death,
levels = c(0,1),
c("alive","dead"))) |>
count(death,
swang1) |>
ggplot(aes(x = swang1,
y = n,
fill = death))+
geom_col(alpha = 1,
color="pink") +
labs(x = 'Treatment type',
y = 'n') +
theme(axis.text.x = element_text(angle = 20,
hjust = 1))We investigate how many survive and die depending on their APS score. APS score predicts death by taking both laberatory values and patient signs into account.
rhc_aug |>
ggplot(aes(x = aps1,
fill = factor(death))) +
geom_bar(position = "stack") And then we make the x-axis an interval to bin the observations.
rhc_aug |>
ggplot(aes(x = aps1_Interval,
fill=factor(death))) +
geom_bar(position = "stack")